Improving Entropy Estimation and the Inference of Genetic Regulatory Networks
نویسنده
چکیده
This paper explores how entropy and other information theoretic quantities may be used to reverseengineer genetic regulatory networks from repeated microarray data. The problem of differentiating genes that undergo direct coregulation from genes whose expression is similar because they belong to the same regulatory pathway is studied from a graphical modeling viewpoint. This leads to the criteria of conditional independence which can be evaluated by computing the conditional mutual information. The latter is completely characterized by the sum of the entropies of joint variables, underlining the need for an entropy estimator that is accurate even in low sampling conditions. We introduce a new plug-in entropy estimator obtained from shrinking maximum likelihood multinomial proportions estimates to the maximum entropy target. We derive the closely related ZIPshrink and ZINBshrink entropy estimators which enhance the shrinkage estimator by first adjusting the shrinkage target depending on the fraction of structural zeros in the multinomial model. The fraction of structural zeros is estimated using a Zero-Inflated Poisson or Zero-Inflated Negative Binomial distribution to model the histogram of bin counts. We compare these three new estimators to state of the art estimators. We show that they give acceptable estimates even in the low sampling regime and are as accurate as the best estimator available today while being 100 faster, making it more suitable for large scale computations. We then compare existing approximations of conditional independence networks such as 0-1 networks and a data processing inequality based approach. As a conclusion, we briefly consider limitations of the method as well as issues related to unobserved variables, causal inference and time series as opposed to steady state experiments. Part I serves both as an introduction and a motivation. It presents the notions of conditional independence and explains why entropy estimation is critical to genetic regulatory network inference. Part II has the core results of this report : it reviews existing entropy estimators for the discrete case, introduces a new entropy estimator based on the statistical notion of shrinkage and compares their performance. Finally, part III compares data processing inequality based approach to genetic regulatory networks reverse-engineering with the so-called 0-1 networks approach. It also has considerations about limitations, pitfalls and possible extensions of the method.
منابع مشابه
Improving the Inference of Gene Expression Regulatory Networks with Data Aggregation Approach
Introduction: The major issue for the future of bioinformatics is the design of tools to determine the functions and all products of single-cell genes. This requires the integration of different biological disciplines as well as sophisticated mathematical and statistical tools. This study revealed that data mining techniques can be used to develop models for diagnosing high-risk or low-risk lif...
متن کاملImproving the Inference of Gene Expression Regulatory Networks with Data Aggregation Approach
Introduction: The major issue for the future of bioinformatics is the design of tools to determine the functions and all products of single-cell genes. This requires the integration of different biological disciplines as well as sophisticated mathematical and statistical tools. This study revealed that data mining techniques can be used to develop models for diagnosing high-risk or low-risk lif...
متن کاملInference of Markov Chain: AReview on Model Comparison, Bayesian Estimation and Rate of Entropy
This article has no abstract.
متن کاملOn the Impact of Entropy Estimation on Transcriptional Regulatory Network Inference Based on Mutual Information
The reverse engineering of transcription regulatory networks from expression data is gaining large interest in the bioinformatics community. An important family of inference techniques is represented by algorithms based on information theoretic measures which rely on the computation of pairwise mutual information. This paper aims to study the impact of the entropy estimator on the quality of th...
متن کاملH∞ Sampled-Data Controller Design for Stochastic Genetic Regulatory Networks
Artificially regulating gene expression is an important step in developing new treatment for system-level disease such as cancer. In this paper, we propose a method to regulate gene expression based on sampled-data measurements of gene products concentrations. Inherent noisy behaviour of Gene regulatory networks are modeled with stochastic nonlinear differential equation. To synthesize feed...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006